An SCP-based heuristic approach for scheduling distributed data-intensive applications on global grids
نویسندگان
چکیده
Data-intensive Grid applications need access to large data sets that may each be replicated on different resources. Minimizing the overhead of transferring these data sets to the resources where the applications are executed requires that appropriate computational and data resources be selected. In this paper, we consider the problem of scheduling an application composed of a set of independent tasks, each of which requires multiple data sets that are each replicated on multiple resources. We break this problem into two parts: one, to match each task (or job) to one compute resource for executing the job and one storage resource each for accessing each data set required by the job and two, to assign the set of tasks to the selected resources. We model the first part as an instance of the well-known Set Covering Problem (SCP) and apply a known heuristic for SCP to match jobs to resources. The second part is tackled by extending existing MinMin and Sufferage algorithms to schedule the set of distributed data-intensive tasks. Through simulation, we experimentally compare the SCP-based matching heuristic to others in conjunction with the task scheduling algorithms and present the results. © 2007 Elsevier Inc. All rights reserved.
منابع مشابه
A New Job Scheduling in Data Grid Environment Based on Data and Computational Resource Availability
Data Grid is an infrastructure that controls huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. The heterogeneity and geographic dispersion of grid resources and applications place some complex problems such as job scheduling. Most existing scheduling algorithms in Grids only focus on one kind of Grid jobs which can be data...
متن کاملA Set Coverage-based Ma Scheduling Distributed Data- on Global G
Data-intensive Grid applications need access to large datasets that may each be replicated on different resources. Minimizing the overhead of transferring these datasets to the resources where the applications are executed requires that appropriate computational and data resources be selected. In this paper, we introduce a heuristic for the selection of resources based on a solution to the Set ...
متن کاملMulti-objective and Scalable Heuristic Algorithm for Workflow Task Scheduling in Utility Grids
To use services transparently in a distributed environment, the Utility Grids develop a cyber-infrastructure. The parameters of the Quality of Service such as the allocation-cost and makespan have to be dealt with in order to schedule workflow application tasks in the Utility Grids. Optimization of both target parameters above is a challenge in a distributed environment and may conflict one an...
متن کاملAn Economy-based Algorithm for Scheduling Data-Intensive Applications on Global Grids
Data Grids have become the de facto platform for the next generation of eScience experiments that will be carried out through large collaborations spread around the world. As the number of entities within a data grid increases, scheduling of applications in order to make the most efficient use of the available resources such as computational, storage and network facilities becomes a challenge. ...
متن کاملScheduling distributed data-intensive applications on global grids
The next generation of scientific experiments and studies are being carried out by large collaborations of researchers distributed around the world engaged in analysis of huge collections of data generated by scientific instruments. Grid computing has emerged as an enabler for such collaborations as it aids communities in sharing resources to achieve common objectives. Data Grids provide servic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Parallel Distrib. Comput.
دوره 68 شماره
صفحات -
تاریخ انتشار 2008